Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis
نویسندگان
چکیده
Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 15 October 2020Accepted: 16 August 2021Published online: 28 2021Keywordsmean-field control, multi-agent reinforcement learning, Q-learning, cooperative games, dynamic programming principleAMS Subject Headings49N80, 68Q32, 68T05, 90C40Publication DataISSN (online): 2577-0187Publisher: Society for Industrial and Applied MathematicsCODEN: sjmdaq
منابع مشابه
Fastest Convergence for Q-learning
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins’ original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-s...
متن کاملCooperative Mean-Field Type Games
In the standard formulation of a game, a player’s payoff function depends on the states and actions of all the players. Yet, real world applications suggest to consider also a functional of the probability measure of states and actions of all the players. In this paper, we consider cooperative mean-field type games in which the state dynamics and the payoffs depend not only on the state and act...
متن کاملOn Mean Field Convergence and Stationary Regime
Assume that a family of stochastic processes on some Polish space E converges to a deterministic process; the convergence is in distribution (hence in probability) at every fixed point in time. This assumption holds for a large family of processes, among which many mean field interaction models and is weaker than previously assumed. We show that any limit point of an invariant probability of th...
متن کاملExpertness based cooperative Q-learning
By using other agents' experiences and knowledge, a learning agent may learn faster, make fewer mistakes, and create some rules for unseen situations. These benefits would be gained if the learning agent can extract proper rules from the other agents' knowledge for its own requirements. One possible way to do this is to have the learner assign some expertness values (intelligence level values) ...
متن کاملConvergence of Optimistic and Incremental Q-Learning
Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SIAM journal on mathematics of data science
سال: 2021
ISSN: ['2577-0187']
DOI: https://doi.org/10.1137/20m1360700